COVID-19 in India: What Went So Terribly Wrong?

Motivation

Recently, India has experienced an increase in COVID-19 cases and deaths. With insufficient medical resources and poor living conditions, many people have to unfortunately be turned away from hospitals in favor of people with more severe cases. But until now, it seemed as if India was doing well with the pandemic! So what has happened recently to change that? In this notebook we will explore that question and investigate which locations within India have it the worst and need medical resources the most.

Data Sources

We will start by importing some of the libraries we will need. The requests library is used to acquire the .json file from the internet and get its contents in a JSON format. The json library is then used to place that data into a local json file and convert it into a dictionary. Finally, we use pandas to convert the dictionary into a dataframe so that we can more easily plot and visualize the data.

Because the data is stored in a specific JSON format, we need to read the JSON file from the web and read it into a JSON data structure. We will start by acquiring the all_totals.json file, which contains the totals for the number of active cases, number of deaths, number of people cured, and the total number of confirmed cases, all with the associated timestamps.

As you can see, the data is stored in a key-value pair format, where the key contains the timestamp and attribute name, while the value contains the number associated with that attribute. We can wrangle this data format into a more table-like structure so that we can convert this dictionary into a dataframe.

There is a problem here with the date ranges, and that is that they are not continuous! If we ever want to visualize the data, it will be important to have a continuous date range. We can do this by adding new rows for the missing days, and simply taking the previous row's values as the values for the new rows. We fill in the missing values in this way because all of the metrics in the dataset are cumulative.

Now that we have the data in a managable format, we can start visualizing the data. Let's start by plotting the number of active cases, total number of confirmed cases, number of deaths, and number of people cured.

We can very clearly see from these plots that cases and deaths have been skyrocketing starting around late March to early April 2021. This is when a new strain of the virus came to India. But what made this new strain so difficult to handle compared to any previous ones, and is the situation different by state in India?

We can see there are some missing values (a lot actually!) that are labelled as 'unassigned.' For now we will drop these values from the table, although further analysis might be required to truly understand the effect of these values on the data overall.

Now we can proceed by trying to visualize the data on a map. The idea is to create three maps where we color states by their relative number of cases/cured/deaths. The first step is to scale down the data by a factor of a million. This will make the scales in our maps more readable.

We will use the folium library, which is a wrapper around the leaflet.js library, to display the maps. Note that these types of visualizations are called choropeths.

Choropeths Tutorial: https://vverde.github.io/blob/interactivechoropleth.html

Choropeths Documentation: https://python-visualization.github.io/folium/quickstart.html#Choropleth-maps

From these plots, we can seen that things seem pretty dire in Maharashta, the state where Mumbai is located. Maharashta has the greatest number of deaths, cases, and cured people. This makes a lot of sense because Mumbai, India's largest city, is located in Maharashta, and so there are some highly dense city areas there. In a city, especially a dense one, it is more difficult to socially distance, making it easier for COVID-19 to spread.

Looking at the other states, it appears that states in Southern India have it worse in number of cases, deaths, and cured people compared to Northern India. Kerala, which is in Southern India, is an interesting case study, because while it shares similar numbers in terms of cases and number of people cured, it has significantly fewer number of deaths than its neighboring states, such as Tamil Nadu and Karnataka.